Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis

نویسندگان

  • Daniel Q. Duffy
  • John L. Schnase
  • John H. Thompson
  • Shawn M. Freeman
  • Thomas L. Clune
چکیده

MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era RetrospectiveAnalysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows. Keywords-MapReduce, Hadoop, high-performance analytics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A MapReduce and MPI Programming Model for Distributed Large Scale 3D Mesh Processing

Developing a high performance platform for large-scale, high-intensity data processing is a priority for researching cost-effective parallel finite element methods (FEM). This paper introduces an efficient MapReduce-MPI based strategy for parallel 3D finite element mesh processing, demonstrates the potential benefits of this approach for optimally utilizing system resources. Preliminary experim...

متن کامل

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

Abstract: In a day-to-day life, the capacity of data increased enormously with time. The growth of data which will be unmanageable in social networking sites like Facebook, Twitter. In the past two years the data flow can increase in zettabyte. To handle big data there are number of applications has been developed. However, analyzing big data is a very challenging task today. Big Data refers to...

متن کامل

An Empirical Evaluation of MapReduce under Interruptions

The presence of interruptions is an unwanted but inevitable fact that all large-scale distributed computing systems have to face. The interruptions are more prevailed for MapReduce applications, as often MapReduce runs on the top of the commodity hardware based clusters, which are more vulnerable than traditional HEC systems. The problem is further exaggerated when running MapReduce application...

متن کامل

Towards Control of MapReduce Performance and Availability

MapReduce is a popular programming model for distributed data processing and Big Data applications. Extensive research has been conducted either to improve the dependability or to increase performance of MapReduce, ranging from adaptive and on-demand fault-tolerance solutions, adaptive task scheduling techniques to optimized job execution mechanisms. This paper investigates a novel solution tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012